161 results found.
Written
Corpus,
Language Type:
Monolingual
Languages:
Russian
Availability:
From Owner
License:
Size:
2.4 GByte Production Status:
Existing-used
Use:
Language Modelling
-
Paper title:Class-based LSTM Russian Language Model with Linguistic Information
-
Paper track:Speech/oral presentation
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Irina Kipyatkova | Russian text corpus | /N |
Documentation:
None
Speech
Corpus,
Language Type:
Monolingual
Languages:
Russian
Availability:
From Owner
License:
Size:
3.5 GByte Production Status:
Existing-used
Use:
Speech Recognition/Understanding
-
Paper title:Class-based LSTM Russian Language Model with Linguistic Information
-
Paper track:Speech/oral presentation
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Irina Kipyatkova | Corpus of Continuous Russian Speech for Automatic Speech Recognition Systems | /N |
Documentation:
Documentation in Russian language
Multimodal/Multimedia
Corpus,
Language Type:
Monolingual
Languages:
Russian
Availability:
From Owner
License:
Size:
5000 GByte Production Status:
Existing-used
Use:
Speech Recognition/Understanding
-
Paper title:Class-based LSTM Russian Language Model with Linguistic Information
-
Paper track:Speech/oral presentation
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Irina Kipyatkova | HAVRUS | /N |
Documentation:
Documentation written in Russian
Written
Corpus,
Language Type:
Monolingual
Languages:
Russian
Availability:
Freely Available
License:
MIT
Size:
None Production Status:
Newly created-in progress
Use:
Question Answering
-
Paper title:Read and Reason with MuSeRC and RuCoS: Datasets for Machine Reading Comprehension for Russian
-
Paper track:Long paper/
-
Paper status:Accept Oral
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Alena Fenogenova | MuSeRC and RuCoS | /N |
Documentation:
https://russiansuperglue.com/
Written
Evaluation Data,
Language Type:
Multilingual
Languages:
English German Russian
Availability:
Freely Available
License:
CreativeCommons
Size:
500 sentences Production Status:
Newly created-finished
Use:
Machine Translation, SpeechToSpeech Translation
-
Paper title:On The Evaluation of Machine Translation SystemsTrained With Back-Translation
-
Paper track:Long/Machine Translation
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Sergey Edunov | Additional human translations for WMT test sets | /N |
Documentation:
Documentation available in English https://github.com/facebookresearch/evaluation-of-nmt-bt
Written
Corpus,
Language Type:
Multilingual
Languages:
Chinese English French Japanese Korean Russian
Availability:
Freely Available
License:
Size:
5000 sentences Production Status:
Newly created-in progress
Use:
Analysis of cross-linguistic morphosyntactic divergences
-
Paper title:Fine-Grained Analysis of Cross-Linguistic Syntactic Divergences
-
Paper track:Long/Resources and Evaluation
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Dmitry Nikolaev | Aligned sub-corpus of Parallel Universal Dependencies | /N |
Documentation:
None
Written
Corpus,
Language Type:
Monolingual
Languages:
Russian
Availability:
Freely Available
License:
CreativeCommons
Size:
92 GByte Production Status:
Existing-used
Use:
Language Modelling
-
Paper title:Phonetic and Visual Priors for Decipherment of Informal Romanization
-
Paper track:Long/Phonology, Morphology and Word Segmentation
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Maria Ryskina | Taiga Сorpus | /N |
Documentation:
https://github.com/TatianaShavrina/taiga_site/blob/master/segments.md
Written
Corpus,
Language Type:
Monolingual
Languages:
Russian
Availability:
Freely Available
License:
Size:
6227 sentences Production Status:
Newly created-finished
Use:
Text Normalization
-
Paper title:Phonetic and Visual Priors for Decipherment of Informal Romanization
-
Paper track:Long/Phonology, Morphology and Word Segmentation
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Maria Ryskina | Dataset of Informally Romanized Russian (Translit) | /N |
Documentation:
None
Written
Treebank,
Language Type:
Multilingual
Languages:
Chinese English French German Italian Japanese Russian Spanish
Availability:
Freely Available
License:
CreativeCommons
Size:
None Production Status:
Existing-used
Use:
Parsing and Tagging
-
Paper title:Why Overfitting Isn't Always Bad: Retrofitting Cross-Lingual Word Embeddings to Dictionaries
-
Paper track:Short/Machine Learning for NLP
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Mozhi Zhang | Universal Dependencies | /N |
Documentation:
None
Written
Evaluation Data,
Language Type:
Multilingual
Languages:
Chinese English French German Italian Japanese Russian Spanish
Availability:
From NIST
License:
Size:
None Production Status:
Existing-used
Use:
Document Classification, Text categorisation
-
Paper title:Why Overfitting Isn't Always Bad: Retrofitting Cross-Lingual Word Embeddings to Dictionaries
-
Paper track:Short/Machine Learning for NLP
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Mozhi Zhang | Reuters RCV1/RCV2 Multilingual Corpus | /N |
Documentation:
None




